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Abstract 


The  trust  region  problem,  minimization  of  a  quadratic  function  subject  to  a  spherical 
trust  region  constraint,  occurs  in  many  optimization  algorithms.  In  a  previous  paper,  the 
authors  introduced  an  inexpensive  approximate  solution  technique  for  this  problem  that 
involves  the  solution  of  a  two-dimensional  trust  region  problem.  They  showed  that  using  this 
approximation  in  an  unconstrained  optimization  algorithm  leads  to  the  same  theoretical  global 
and  local  convergence  properties  as  are  obtained  using  the  exact  solution  to  the  trust  region 
problem.  This  paper  reports  computational  results  showing  that  the  two-dimensional  minimi¬ 
zation  approach  gives  nearly  optimal  reductions  in  the  n-dimension  quadratic  model  over  a 
wide  range  of  test  cases.  We  also  show  that  there  is  very  little  difference,  in  efficiency  and  reli¬ 
ability,  between  using  the  approximate  or  exact  trust  region  step  in  solving  standard  test  prob¬ 
lems  for  unconstrained  optimization.  These  results  may  encourage  the  application  of  similar 
approximate  trust  region  techniques  in  other  contexts. 
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1.  Introduction. 

In  this  paper  we  consider  the  problem 

minimise  {gT d  +  VidT Bd  :  [jd  ]{  <  A},  (1.1) 

where  gtR* ,  B(R*X*  is  symmetric,  and  A  >  0.  Problems  of  this  type  typically  arise  in  trust 

region  algorithms  for  unconstrained  optimization.  We  report  theoretical  and  computational 

results  comparing  approximate  and  exact  solution  techniques  for  (1.1).  Our  results  show  that 

an  inexpensive  approximate  solution  technique  of  Shultz,  Schnabel,  and  Byrd  [1985]  appears  to 

perform  almost  as  well  as  the  more  expensive  exact  method  in  practice.  These  results  appear 

to  have  interesting  ramifications  for  the  solution  of  trust  region  problems  in  several  contexts. 

In  the  context  of  unconstrained  optimization,  the  quadratic  function  being  minimized  in 
(1.1)  is  an  approximation  to  the  Taylor  series  of  the  objective  function  at  the  current  iterate, 
where  g  is  the  gradient  of  the  objective  function  at  the  current  iterate  and  B  is  some  approxi¬ 
mation  to  the  Hessian  at  the  current  iterate.  In  view  of  this,  we  will  refer  to  a  function 
M(d)  —  g  d  +  Virf  Bd  as  a  quadratic  model  and  speak  of  reducing  the  value  of  the  quadratic 
model.  Instead  of  just  minimizing  this  approximation  to  the  Taylor  series,  a  trust  region  algo¬ 
rithm  constrains  the  length  of  the  step  to  be  less  than  some  adjustable  parameter  A,  recogniz¬ 
ing  that  the  approximation  is  only  accurate  in  some  neighborhood  of  the  current  iterate. 
Then,  the  solution  or  approximate  solution  to  (1.1)  is  used  as  the  trial  step  to  move  to  the  next 
iterate.  Thus,  in  a  trust  region  algorithm,  the  main  source  of  computational  effort,  apart  from 
the  function  evaluations  required,  is  the  work  on  a  problem  of  the  form  of  (1.1)  to  determine 
the  step  from  the  current  iterate. 


Trust  region  algorithms  differ  in  their  strategies  for  approximately  solving  (1.1).  An  early 
trust  region  algorithm  is  the  single  dogleg  algorithm  of  Powell  [1970].  This  algorithm  takes  as 


its  approximate  solution  to  (l.l)  the  step  a,  with  J|s  ]j  <  A,  on  the  piecewise  linear  curve 

jjj  jj2 

passing  through  the  origin,  the  Cauchy  point  d  =  —  ^  g,  which  gives  the  lowest  value  of 

gTBg 

T  T 

the  quadratic  model  g  a  +  Vis  5s  in  the  steepest  descent  direction,  and  the  Newton  point 
d  =  —5  1g,  which  gives  the  lowest  value  of  the  quadratic  model  overall.  Dennis  and  Mei 
[l979j  suggest  a  similar  strategy,  but  with  a  modified  double  dogleg  curve  that  is  biased  toward 
the  Newton  point.  Trust  region  algorithms  of  the  dogleg  type  have  the  disadvantage  that  they 
are  not  intended  to  deal  with  the  case  that  5  is  not  positive  definite.  Note  that  they  constrain 
d  to  the  two-dimensional  subspace  spanned  by  the  Newton  and  steepest  descent  directions. 

The  exact  solution  to  (1.1)  satisfies  ( 5+a/)s  =  —  g  (see  Theorem  1).  Hebden  [1973]  and 
More'  [1977]  suggest  approximately  solving  (1.1)  by  iterating  on  a  to  obtain  a  >  0  such  that 
\\{B+aI)  *g  jj  is  approximately  equal  to  A,  and  then  taking  the  step  —(B+a/)  1g.  This 
approach  requires  a  new  factorization  of  5+a/  for  each  new  value  of  a,  and  thus  may  require 
increased  algebraic  effort  in  comparison  to  the  dogleg  algorithms. 

More  recently,  Gay  [1981],  Sorensen  [1982],  and  More'  and  Sorensen  [1983]  have  suggested 
methods  that  produce  steps  that  attain  at  least  a  fixed  fraction  r  <  1  of  the  minimal  value  for 
(1.1).  We  call  such  a  method  "  r-optimal."  For  most  problems,  these  r-optimal  methods  compute 
the  approximate  solution  to  (1.1)  in  the  same  way  as  the  More'-Hebden  methods.  In  the  so- 
called  "hard  case,"  when  there  is  no  a  such  that  5+a/  is  positive  semi-definite  and 
]|(5+a/)  lg  j]  >  A,  a  r-optimal  method  requires  a  direction  of  negative  curvature  of  5  to 
compute  its  approximate  solution  to  (1.1).  These  r-optimal  methods  have  strong  theoretical 
properties,  while  the  computational  work  required  is  comparable  to  that  required  for  the 
More'-Hebden  approach. 


Shultz,  Schnabel,  and  Byrd  [1985]  present  an  indefinite  dogleg  algorithm  that  achieves  the 
strong  theoretical  properties  of  a  r-optimal  algorithm  while  retaining  the  computational 
efficiency  of  a  dogleg  algorithm.  The  indefinite  dogleg  algorithm  computes  an  approximate 
solution  to  (1.1)  by  performing  a  two-dimensional  quadratic  minimization, 

minimize  {gT d  4-  VtdT Bd  :  \[d  {[  <  A,  <ft[u,vj}, 
where  u  and  v  are  —  B  and  —g  if  B  is  positive  definite  and  are  chosen  from  among  —g, 

—(B+al)  lg,  and  a  negative  curvature  direction  when  B  is  indefinite.  Note  that  this 

approach,  by  exact  minimization  in  a  two-dimensional  subspace,  will  always  produce  a  step 

tnat  reduces  the  quadratic  model  by  at  least  as  much  as  other  dogleg  type  algorithms  when  B 

is  positive  definite.  The  additional  cost  of  the  two-dimensional  minimization  over  a  dogleg 

approach  is  simply  the  work  required  to  exactly  solve  (1.1)  where  n  =  2,  and  is  thus  negligible. 

The  main  purpose  of  this  paper  is  to  report  some  perhaps  surprising  computational  evi¬ 
dence  that  minimization  over  a  subspace  spanned  by  two  reasonably  chosen  directions  tends  to 
produce  a  high  percentage  of  the  value  given  by  minimizing  exactly  over  all  of  R* . 

In  Section  2  we  briefly  compare  the  theory  of  r-optimal  algorithms  and  algorithms  of  the 
type  considered  by  Shultz,  Schnabel,  and  Byrd  (1985,.  In  Section  3  we  briefly  describe  the  algo¬ 
rithm  tested.  Section  4  describes  our  tests  and  and  reports  on  their  results.  Finally,  in  Section  5 
we  comment  on  implications  of  these  results. 

2.  Theoretical  Comparison  of  Exact  and  Approximate  Trust  Region  Algorithms. 

This  section  discusses  the  percentage  of  the  optimal  value  of  (1.1)  that  is  achieved  by  any 


approximate  method  in  a  class  proposed  by  Shultz,  Schnabel,  and  Byrd  [1985]. 


First,  we  give  some  definitions. 

j{  }J  is  the  Euclidean  norm  on  R* . 

For  a  symmetric  B(R*X%,  let  be  the  smallest  eigenvalue  of  B,  and  let 

v^B^R*  be  an  eigenvector  of  B  corresponding  to  Xj(5).  For  notational  conveni¬ 
ence,  we  will  sometimes  suppress  the  dependence  on  B. 

For  any  AtR*X% ,  let  A+  denote  the  generalized  inverse  of  A. 

For  any  ux,  u2,  umfJ2* ,  let  [  uv  ti2,  um  ]  denote  the  subspace  of  R *  spanned 
by  «i.  u2.  -i  “m- 

A  function  a  :  Rn  XR*X*  X(Q,cc)—*Ru  is  called  a  step  computing  function,  typically 
denoted  by  a{g,B, A). 

For  any  gtR* ,  symmetric  BeR%x* ,  and  A  >  0,  let  a.(g,B, A)  be  a  solution  to  (1.1). 
Such  an  a.  is  referred  to  as  an  optimal  step  computing  function. 

For  a(Rm,  let  pred(a ,g ,B)=—aT g  —  VzaT  Ba. 

For  rf(0,l],  a  step  computing  function  a  is  r-optimal  if  for  any  gtR* ,  symmetric 
BeR*Xn,  and  A  >  0, 

pred{a(g,B,S),g,B)  >  r  pred{a.{g ,B ,A),j ,B)  . 

We  will  now  state  a  theorem  characterizing  the  solution  to  (1.1).  See,  for  example,  Soren¬ 
sen  [1982]  for  a  proof  of  this  result.  This  result  will  provide  the  theoretical  basis  for  our  step 
computing  function  as  well  as  for  r-optimal  step  computing  functions. 

Theorem  1. 

Consider  any  symmetric  Se/?*X“,  gfR ",  and  A  >  0.  Let  a.tR *  be  a  solution  to  (1.1).  If  B  is 
positive  definite  and  \\B  lg  ]j  <  A,  then  a.  =  —B  lg. 


If  B  is  positive  definite  and  \\B  lg  }|  >  A,  then  for  the  unique  a  >  0  such  that 
!!(fl+o/fl5  !i  =  A,  s.  =  -{B+a/fV 

If  Xj  <  0  and  there  is  an  a  >  —  \x  such  that  \\(B-raI)  lg  ||  =  A,  then  s.  =  —(B+oI)  lg. 
Otherwise,  a,  =  —{B—\xI)*g  +  £v1(  where  is  such  that  ]|— (B—  Xj/)+3  4-  jj  =  A. 

It  is  clear  from  Theorem  1  that  any  method  that  attempts  to  closely  approximate  the 
solution  to  (1.1)  must  approximate  XJB)  reasonably  well  in  the  case  that  B  is  not  positive 
definite,  and  must  also  produce  some  direction  of  negative  curvature  in  the  final  case  in 
Theorem  1,  which  More'  and  Sorensen  designate  the  "hard  rasp."  This  case  can  only  arise  if  g  is 
orthogonal  to  every  eigenvector  of  B  corresponding  to  the  eigenvalue  Xj. 

We  now  discuss  the  conditions  on  a  step  computing  function  presented  in  Shultz,  Schna¬ 
bel,  and  Byrd  [1985].  They  show  that  a  trust  region  algorithm  for  minimization  that  uses  a  step 
computing  function  satisfying  these  conditions  has  the  same  global  and  local  convergence  pro¬ 
perties  as  an  algorithm  using  an  optimal  or  '-optimal  step  computing  function.  But,  as  we  will 
show  shortly,  these  conditions  are  slightly  weaker  than  the  r-optimality  condition  in  another 
sense. 

The  conditions  on  a  step  computing  function  s(g,B, A)  are  given  below.  The  first  condi¬ 
tion,  originally  due  to  Powell  [1970],  roughly  says  that  the  step  provides  at  least  a  fixed  frac¬ 
tion  of  the  decrease  in  the  quadratic  model  that  would  be  obtained  from  the  best  permissible 
step  in  the  steepest  descent  direction.  The  second  condition  roughly  says  that  when  B  is 
indefinite,  the  step  provides  at  least  a  fixed  fraction  of  the  decrease  in  the  quadratic  model 
that  would  be  obtained  from  the  best  permissible  step  in  the  direction  of  greatest  negative  cur¬ 
vature.  The  third  condition  simply  says  that  if  B  is  positive  definite  and  the  Newton  step  is 


permissible,  then  it  is  chosen. 


Conditions  on  a  Step  Computing  Function. 

Condition  #1 

There  are  cv  >  0  such  that  for  all  geJt* ,  for  all  symmetric  BeR%X% ,  and  for  all  A  >  0, 
pred(  s(g,B,  A),  g,  B  )>  el  jj?  ||  min{  A.Cj-11^-11-}. 

!!5  i! 

Condition  #2 

There  is  a  c2>0  such  that  for  all  g(R* ,  for  all  symmetric  BeR*X* ,  and  for  all  A  >  0, 

pred(  s(g,B,A),  g,  B  )  >  c2(-\(5))A2  . 

Condition  #3 

If  B(R%X*  is  positive  definite  and  ][—.  B  lg  }{  <  A,  then  a[g,B, A)  =  —B  1g. 

The  step  computing  function  for  which  we  present  test  results  in  Section  4  satisfies  these 
three  conditions.  Example  1  below  shows  that  when  B  is  positive  definite,  a  step  satisfying 
these  three  conditions  may  not  be  r-optimal,  for  any  r  >  0.  The  step  computing  function  used 
in  this  example  is  exactly  the  one  used  in  our  computational  tests  in  Section  4,  when  B  is  posi¬ 
tive  definite.  The  second  example  given  below  shows  that  when  B  is  indefinite,  a  step  comput¬ 
ing  function  satisfying  Conditions  #1  and  #2  also  may  not  be  r-optimal,  for  any  r  >  0.  The 
step  computing  function  used  in  this  example,  minimization  over  the  subspace  spanned  by  the 
gradient  and  a  negative  curvature  direction,  is  the  simplest  function  that  satisfies  Conditions 
#1  and  #2  in  the  indefinite  case. 

Example  1  (  positive  definite  case) 

Let  €  >  0,  B  =  diag{\,2 ,(*),  g  =  (£2,€2,£3)r,  a  =  £2,  and  A  =  ]](fl+Q/)  lg  ]].  Then  the  solution 
to  (1.1)  is 

2  2  3  2 

,  -  —  1  — £  — £  — £  T  — £  — £  T 

3.  =  -(B+al)  lg  =  ( - — -)  =  ( - :  -Vi, - -)r  , 


and  pred(s.,g ,B)  =  0(e  ).  Define  a  step  computing  function  by  taking  s(p,i?,A)  to  be  the  solu¬ 
tion  to 


minimize  [gT d  +  VzdT Bd  :  \\d  jj  <  A,  de[  —g  ,  —B  lg  ]}, 
which  is  the  step  to  the  minimum  of  the  quadratic  model  subject  to  the  trust  region  in  the 

two-dimensional  subspace  spanned  by  the  gradient  direction  and  the  Newton  direction.  Since 

this  step  will  do  at  least  as  well  as  the  best  gradient  step,  it  is  easy  to  see  that  »(g,B,A) 

satisfies  Condition  #1.  Condition  # 2  is  vacuously  satisfied,  since  XJi?)  >  0.  Note  that 

A  =  0(1),  {I,  ||  =  0(f2),  gT Bg  =  0(e\  / S'1  g  =  0(e\  and  || B~lg  J|  =  O(-). 

e 

For  given  g,  B,  and  A,  a(g,B, A)  =  ptg  +  uB  lg  for  some  n,  i/tR .  Then 

prcd{  a(g,B, A),  g,B)  ~  —p.gT  g  —  i/gT B~l g  —  Va  p*  g  T Bg  -  pLug  T BB  Xg  -  %  u  gT B  lBB  lg 

<  (— pt—  Mf)gT 9  —Vzp2gTBg  —  ugT B  Xg  . 

T  D-i 

g  B  g  __ 

Now,  since  - =  0(f),  it  follows  that  for  all  small  enough  e  >  0,  g  and  B  g  are 

n  „  II  I'd-1  ll 

ii 9  n  n °  9  ii 

nearly  orthogonal  and  so  ]{i /B  lg  j|  <  0(A)  =  0(1),  and  j| ng  ||  <  0(A)  =  0(1).  Thus, 
X  <  0(e),  j/ij  <  0(~),  and 


pred(s(g,B,A),g,B)  <  -/j(l  +  i/)0( (*)  -  V2  ^  0(t*)  4-  0(<3) 

<(-/*  -^^)0(e*)-^0(eS)  +  0(e3) 

<  (-B  ~  ^2)0(e4)  +  0(e3)  . 

Finally,  since  max(-/J  —  n  )  =  ii  we  have  that 

pred(a(g  ,B  ,S),g  ,B)  <  0(t3)  . 

Thus, 


pred(a(g,B,S),g,B) 

pred(s.,g,B) 


=  0(f). 


Example  2  (indefinite  case) 

Let  t  >  0,  B  =  diag(—(2,e,  1),  g  =  (0,f,€)r,  q  =  (1,0, 0)T,  a  =  2f2,  and  A  =  j|(f?-t-o7)  1  g  |J.  Then 
the  solution  to  (1.1)  is 

'*  =  —{B  +al)  lg  =  (o,-^-,— =^-)r  =  (0,-^-  •  f  , 

€  +  2«  1  +  2«  1+26  i  +  2e 

and  pred(3.,g,B)  =  O(t).  Now,  define  a  step  computing  function  by  taking  s(g,B, A)  to  be  the 
solution  to 


minimize  {gT  d  +  V2  dT  Bd  :  jjd  JJ  <  A,  <f([  g  ,  q\}- 
Again,  this  step  satisfies  Condition  #1  since  it  does  at  least  as  well  as  the  best  gradient  step, 

and  it  can  easily  be  shown  to  satisfy  Condition  #2  since  q  is  nearly  the  direction  of  greatest 

negative  curvature  of  B.  The  best  possible  reduction  for  a  step  in  the  direction  q  is 

r  \ 

2  2  2.  9  9 

(  A  =  0(e  ),  since  A  =  0(1).  Since  -  <  ,  it  follows  easily  that  the  best  reduction 

gT Bg  'Iff  !! 

m  [gT  g)2  ,  2. 

in  the  direction  g  is  given  by  the  Cauchy  step  and  is  =  0(6  ).  Thus,  since 

gTBg 

t  r  .  .  ,  2,  prcd{s(g  ,B  ,A),a  ,B) 

g  q  =  g  Bq  =  0,  it  is  clear  that  pred(s(g ,B ,A),j ,B)  =  0((  ),  so  - ^ ^  =  O(f). 

pred(s.,g  ,B) 


In  both  Examples  1  and  2,  the  condition  number  of  B  approaches  infinity.  Theorem  2 
gives  a  condition  on  the  matrix  B  that  implies  the  r-optimality  of  a  step  computing  function 
satisfying  Conditions  #1  and  #2.  This  condition  is,  roughly,  that  if  B  is  positive  definite,  the 
condition  number  of  B  is  bounded,  and  if  B  is  indefinite,  the  amount  of  negative  curvature  is 
not  negligible. 


Theorem  2.  Suppose  »  is  a  step  computing  function  that  satisfies  Conditions  stl  and  #2,  with 


11  p  11 

uJLu 


constants  c  ,  cv  c2>0.  Let  k  >  0,  and  consider  any  g,  B,  and  A,  with 


<  K. 


pred(s.,g  ,B)  k 

where  s.  is  a  solution  to  (l.l)  for  g,  B,  and  A. 


2  +  c , 


Proof.  For  notations!  convenience,  let  a  =s(g,B,  A),  pred  =  pred{s  ,g  ,B),  and 

pred.  =  pred(s.,g  ,B).  First,  note  that 

XT  T  T 

min  g  w+Vzw  Bw  >  min  g  w  +  min  Vtw  Bw 


Thus, 


>  —  Jjj  ]|A  +  min{0,  fcX^BJA  }  . 


pred.  <  |]  g  jjA  +  min{0,  J4(—  Xt(B))A  }  . 


Now,  if  Xt(5)  >  0,  then  by  (2.1) 


pred.  <  \\g  JJA  , 


T  T  ,  , 

and  since  the  Newton  step  is  the  global  minimum  of  g  w  4-  Vzw  Bw  in  this  case, 


pred.  <  pred(-B  g,g,B)  =  Kg  B  ?<fc|!?|J  \\B  jj 
li.  II2  it  d  n  1 1  „  i ' 2  ii  a  ii2 

=  y2  \\b~x  jj  jj b  JJ  -u-Lu-  < 

\\B  JJ  \(B)  \\B  II  \\B  JJ 


Thus,  by  Condition  #1, 


Cjmin{  \\g  JJA,  c ^  M 

P^d _ |{fl  jj  .. 


pred. 


>  CjminJ  1,  }. 


■r 1 1  _  II  \  u, ....  112.  li—  i 
tmn\  II?  u-\  I 

1 1  P  ii 

n  ii 


"o  11 

Otherwise,  X .(B)  <  0.  Suppose  first  that  A  >  c.- 1 J  ^  .  Then  by  (2.1), 

1  II  D  II 

llW  II 


pred.  <  JJ?!  JA  +  *  \\B  JJA2  <  —  \\B  JJA2  -r  v2  JJ  B  JJ  A*  =  ( V2  +  —  )  JJ  B  JJ  A1  . 
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Thus,  by  Condition  #2, 

pred  c2i\(-fl)i-^  cz  ^ic2  J_ 

""',-_()4  +  i)!|fl|!^')4  +  i  *  "2  +  ?,  (2-3> 

Cj  Cj 

"g  11 

Finally,  if  A<  c,-11 — ,  then  by  Conditions  #1  and  #2,  pred  >  e.  !!g  ]]A,  and 
non 

II  II 


pred  >  c2'Xx(B)J  A  ,  so  by  (2.1) 


pred  ‘i\\9  \\^  +  c2\\(B)\^) 

-  >  y2 -  >  y2min{cj,  c2}. 


pred. 


ii?  !!  A  +  VajXJB)  j  A 


The  conclusion  now  follows  from  (2.2),  (2.3),  and  (2.4).  □ 


3.  The  Step  Computing  Function. 


We  now  briefly  describe  the  step  computing  function  used  for  the  tests  reported  in  Sec¬ 
tion  4.  It  is  described  in  more  detail  in  Shultz,  Schnabel,  and  Byrd  [1985].  They  show  that  it 
satisfies  the  conditions  on  a  step  computing  function  given  in  Section  2.  As  shown  by  Example 
1,  however,  it  is  not  a  r-optimal  step  computing  function.  We  will  refer  to  this  function  as  the 
indefinite  dogleg  step  computing  function. 

When  B  is  positive  definite,  our  step  computing  function  chooses  d  to  minimize  the  qua¬ 
dratic  model  over  the  subspace  spanned  by  the  steepest  descent  and  Newton  directions,  subject 
to  the  trust  region  constraint. 


In  the  indefinite  case,  our  step  computing  function  computes  an  ae(— Xj,  —  2XJ,  and  a 

T 

v  Bv  ri  >  ,  _ 

negative  curvature  direction  v  with  - e[Xj,  VfeX^j .  These  a  and  v  are  obtained  by  using  the 

v  v 

Lanczos  method  (see,  e.g.  Parlett  [1980]  )  to  approximate  an  eigenvalue  of  B,  hopefully  \v  to 


,\ 
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within  a  relative  error  of  0.1.  We  then  set  a  equal  to  a  constant  (2  in  the  implementation 
tested)  multiplied  by  the  approximation  from  the  Lanczos  method,  and  use  the  Choleslcy  fac¬ 
torization  to  factor  B+oI  and  determine  whether  it  is  positive  definite.  If  the  Cholesky  factor¬ 
ization  fails,  then  the  eigenvalue  approximated  by  the  Lanczos  method  was  not  X,;  in  this  case 
a  better  direction  of  negative  curvature  is  obtained  from  the  Cholesky  decomposition  and  the 
Lanczos  method  is  restarted  using  this  direction. 

The  step  computing  method  always  starts  by  attempting  to  do  the  Cholesky  factorization 
of  B.  Thus,  if  B  is  positive  definite,  we  immediately  obtain  the  factorization  of  B  needed  to 
compute  —B  lg.  The  results  in  Section  4  show  that  an  average  of  roughly  1.1  matrix  factori¬ 
zations  per  iteration  are  required  by  this  method. 

The  Indefinite  Dogleg  Step  Computing  Function. 

Let  be  given.  Given  gtR* ,  symmetric  BzRnX* ,  and  A  >  0,  if  B  is  not  posi- 

•  Bv 

tive  definite,  compute  Q€(—  Xl(  —  2XJ  and  veR  such  that  <  —  p\y 

v  v 

If  B  is  positive  definite  then  s(g,B, A)  is  the  solution  to 

minimize  {gT d  4-  VzdT Bd  :  \\d  JJ  <  A,  dt [  —g  ,  —B  1  g  ]}  .  (3.1) 

If  B  is  not  positive  definite  and  \\(B-*-oI)  lg  JJ  >  A  then  s(^,f?,A)  is  the  solution 

to 

minimize  {gT  d  +  \kdT  Bd  :  JJd  j|  <  A,  rfc[  ~g  ,  ~(B-t-aI)  lg  ]}  ,  (3.2) 

If  B  is  not  positive  definite  and  JJ(fl-t-Q/)  lg  JJ  <  A  then  s(j,S,A)  is 

— (B-\-aI)  ,  (3-3) 

where  £  is  selected  so  that  ]Js(y,£?,A)  JJ  =  A  and  £vT [B -t-/>7)  1  g  <  0. 
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The  three  steps  (3.1),  (3.2),  and  (3.3)  above  are  denoted  by  "P",  "I",  and  "H"  for  "positive 
definite,"  "indefinite,"  and  "hard  case,"  respectively,  in  Table  1.  An  alternative  step  in  the  hard 
case,  perhaps  more  in  keeping  with  the  two-dimensional  minimization  approach,  would  be  to 
choose  a(g,B,A)  as  the  solution  to 

minimize  {gT d  +  VzdT  Bd  :  Jjd  ]{  <  A,  de[-{B-JraI)~lg,  v]}  . 

The  step  (3.3)  was  chosen  instead  because  its  form  is  closer  to  the  step  taken  by  the  optimal 

algorithm  in  the  hard  case. 


When  X^B)  is  close  to  0,  the  indefinite  dogleg  step  is  sometimes  computed  by  a  slight 
variant  of  the  above.  The  augmentation  a  is  instead  calculated  by 


Pre*a 


c2A‘ 


(3.4) 


where 


predg  =  minimize  {gT d  4-  VzdT Bd  :  Jjd  JJ  <  A,  de[— y]}  . 

Then  a(g,B, A)  is  calculated  by  (3.2).  This  modification,  which  is  needed  in  theory  when 

Xj(B)  =0  and  in  practice  when  \{B)  is  close  to  0,  is  explained  in  detail  in  Shultz,  Schnabel, 

and  Byrd  [1085].  We  denote  a  step  that  uses  it  by  S,  for  "(nearly)  semi-definite.” 


4.  Test  Results. 

We  now  report  our  test  results  for  the  indefinite  dogleg  step  computing  function  described 
above.  We  tested  the  method  both  on  randomly  generated  g,  B,  and  A,  and  on  g,  B ,  and  A 
arising  in  the  context  of  a  trust  region  algorithm  for  minimization.  Our  approach  was  to  com¬ 
pare  the  decrease  in  the  quadratic  model  of  our  step  computing  function  to  the  optimal 
decrease.  The  results  detailed  below  show,  perhaps  surprisingly,  that  on  the  average  the 


13 


indefinite  dogleg  step  computing  function  attained  a  high  percentage  of  the  optimal  decrease. 
The  tests  were  performed  in  double  precision  arithmetic  on  a  DEC  VAX  11/780  in  the  Depart¬ 
ment  of  Computer  Science  of  the  University  of  Colorado  at  Boulder. 

We  first  present  results  on  the  behavior  of  our  indefinite  dogleg  step  computing  function 
on  a  large  number  of  randomly  generated  trust  region  problems. 

First  we  will  describe  how  the  trust  region  problems  were  generated.  Our  goal  was  to 
generate  reasonable  problems,  and  yet  to  make  the  problems  difficult  enough  to  provide  a  real¬ 
istic  test  of  the  difficult  cases  for  the  indefinite  dogleg  step  computing  function.  We  tested  the 
algorithm  on  a  variety  of  test  sets.  Each  test  set  consisted  of  25  problems,  5  problems  each  of 
dimensions  20,  40,  60,  80,  and  100,  all  generated  by  the  same  scheme. 

A  trust  region  problem  is  defined  by  the  symmetric  matrix  B,  a  vector  g,  and  the  trust 
radius  A.  For  each  test  set  we  first  generated  B  and  g,  then  chose  optimal  steps  of  one  of 
three  forms,  and  finally  set  A  to  be  the  length  of  the  optimal  step.  In  this  way  we  efficiently 
generated  problems  with  known  optimal  solutions.  For  the  first  through  the  nineteenth  test 
sets,  the  optimal  step  was  selected  by  choosing  an  augmentation  a  and  then  taking  the  optimal 
step  to  be  ~{B+aI)  ig.  For  the  twentieth  test  set,  values  of  a  and  £  were  chosen,  and  the 
optimal  step  was  taken  to  be  — (5+o7)  lg+£vv  where  Uj  is  a  normalized  eigenvector  for  Xx(5). 
For  the  last  test  set,  g  was  taken  to  be  0,  and  the  optimal  step  was  taken  to  be  Vy 

The  basic  scheme  for  generating  B  and  g  was  to  first  choose  the  eigenvalues  of  B  from  a 
uniform  distribution  in  some  interval,  using  the  random  number  generator  of  Schrage  [1979]. 
Then,  as  in  More'  and  Sorensen  [1983],  the  diagonal  matrix  consisting  of  these  eigenvalues  was 
pre-  and  post-multiplied  by  orthogonal  matrices,  and  B  was  taken  to  be  the  resulting  matrix. 
Next  the  components  of  g  were  randomly  generated  in  some  interval,  and  then  pre-muitiplied 
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i 

i 

} 

|  by  the  same  orthogonal  matrices. 

When  an  interval  is  given  in  the  second  and  third  columns  in  Table  1,  it  is  the  interval  in 
which  the  eigenvalues  of  B  and  the  unmodified  components  of  g  are  uniformly  distributed. 

I  When  an  interval  is  listed  in  the  fourth  column  in  Table  1,  it  is  the  interval  in  which  the  values 

of  the  optimal  augmentation  a  are  uniformly  distributed. 

For  the  test  sets  1  through  6  given  in  Table  1,  no  further  modifications  to  B  or  g  were 
made.  The  optimal  step  s .  was  simply  taken  to  be  —(B+al)  lg,  with  A  =  lg  jj. 

The  test  sets  7  through  16  were  generated  in  the  same  way  as  the  first  six  test  sets,  but 
in  addition  changes  were  made  to  either  B  or  g ,  in  an  attempt  to  make  the  problems  more 
difficult.  In  order  to  generate  problems  with  a  more  difficult  eigenvalue  distribution,  the  smal¬ 
lest  eigenvalue  from  a  uniform  distribution  over  (0,2)  was  sometimes  set  to  0,  and  sometimes 
switched  to  the  opposite  sign.  These  modifications  are  designated  in  Table  1  by  a  "Z"  (for 
"zero")  or  an  "O"  (for  "opposite"),  respectively,  following  the  interval  for  the  eigenvalue  range. 
In  order  to  generate  problems  for  which  the  best  gradient  step  tends  to  do  poorly,  sometimes 
the  components  of  g  corresponding  to  positive  eigenvalues  were  chosen  to  be  uniformly  distri¬ 
buted  in  the  interval  (—1,1),  while  the  components  corresponding  to  negative  eigenvalues  were 
chosen  to  be  uniformly  distributed  in  the  interval  (—0.1, 0.1).  The  intent  of  this  was  to  make  g 
likely  to  be  a  direction  with  large  positive  curvature,  and  hence  make  the  best  gradient  step 
tend  to  give  a  small  reduction  of  the  quadratic  model.  These  problems  are  designated  by  a  ”B" 
(for  "biased  ')  following  the  interval  listed  in  the  gradient  component  range  column. 

For  the  test  sets  17  through  19,  the  eigenvalues  were  chosen  to  be  normally  distributed 
with  mean  0.  This  was  done  in  order  to  produce  test  problems  with  a  non-uniform  eigenvalue 
distribution  and  to  make  it  more  likely  that  \l(B)  is  relatively  isolated  from  any  other  nega- 
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tive  eigenvalues  of  B.  The  optimal  step  was  taken  to  be  of  the  form  — (B-K>/)  1  g,  as  for  the 
first  16  test  sets.  These  problems  are  designated  by  an  "N"  (for  "normal")  appearing  in  the 
eigenvalue  range  column.  In  order  to  make  these  problems  more  difficult,  the  gradient  com¬ 
ponents  were  chosen  in  the  biased  way  described  above,  as  designated  by  a  "B"  appearing  in 
the  gradient  component  range  column. 

For  the  test  set  20,  the  component  of  the  gradient  corresponding  to  the  smallest  eigen¬ 
value  of  B  was  set  to  0.  Then  a  random  value  for  £  was  chosen  uniformly  in  the  interval  (0,1), 
and  the  optimal  step  was  taken  as  a.  =  —  (B+r»I)~lg  +  Since  A  was  chosen  as  the  length 
of  a.  and  the  eigenvalue  range  was  such  that  \j(B)<0  always  occurred,  this  test  set  consists  of 
problems  of  the  type  called  the  "hard  case"  by  More  and  Sorensen  [  1983] .  Thus  they  are  desig¬ 
nated  by  an  "H"  (for  "hard  case")  appearing  in  the  gradient  component  range  column. 

Test  set  21  consisted  of  saddle-point  p,  blems.  The  gradient  was  set  to  0  and  the 
optimal  solution  chosen  to  be  the  eigenvector  of  length  1  corresponding  to  the  smallest  eigen¬ 
value.  These  problems  are  designated  in  Table  1  by  an  "S"  (for  "saddle-point")  appearing  in 
the  gradient  component  range  column. 

Now  we  describe  the  remaining  columns  of  Table  1.  For  each  test  set,  the  fifth  column 
lists  the  type  of  steps  taken  by  the  indefinite  dogleg  step  computing  function,  followed  by  the 
number  of  problems  out  of  the  25  in  that  set  that  each  step  type  was  taken. 

For  each  test  set,  the  sixth  and  seventh  columns  report  the  average  and  minimum  ratios 
of  the  decrease  in  the  quadratic  model  obtained  by  the  indefinite  dogleg  step  divided  by  the 
decrease  obtained  by  the  optimal  step.  The  eighth  column  reports  the  average  ratio  of  the 
decrease  obtained  by  the  best  gradient  step  to  the  optimal  decrease.  This  is  included  in  order 
to  show  that  the  gradient  direction  alone  does  not  do  particularly  well  on  these  test  problems. 


Table  1 


Fraction  of  Optimal  Reduction  Obtained  by 
the  Indefinite  Dogleg  Step  Computing  Function 
on  Randomly  Generated  Trust  Region  Problems 
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Notes 


1)  In  eigenvalue  range  column,  "O"  means  smallest  eigenvalue  is 
switched  to  the  opposite  sign;  "2"  means  smallest  eigenvalue  is 
set  to  0;  "N"  means  eigenvalues  are  normally  distributed. 

2)  In  gradient  component  range  column,  "B"  means  that  the  gradient 
components  are  taken  in  (—0.1, 0.1)  if  corresponding  eigenvalue 

is  negative,  taken  in  (—1,1)  otherwise;  "H”  means  that  gradient 
component  corresponding  to  most  negative  eigenvalue  is  set  to  0; 

"S"  means  that  g=0. 

3)  In  step  type  column, 

"P"  means  that  step  was  of  the  form  (3.1), 

"1”  means  that  step  was  of  the  form  (3.2), 

"H”  means  that  step  was  of  the  form  (3.3), 

”S"  means  that  step  was  of  the  form  (3.2)  with  Q  given  by  (3.4). 


We  observe  from  Table  1  that  the  indefinite  dogleg  step  computing  function  obtains  a 
very  high  fraction  of  the  optimal  reduction  in  the  quadratic  model.  In  fact,  in  all  the  test  sets 
in  Table  1,  the  lowest  average  fraction  of  the  optimal  obtained  is  0.91  in  test  set  5,  and  all  of 
the  other  average  fractions  are  bigger  than  0.95.  Note  also  that  the  Newton  step  is  never  the 
optimal  step  in  these  tests,  so  it  is  not  the  case  that  the  fraction  is  high  simply  because  New¬ 
ton  steps  are  weighting  it  toward  1.  Indeed,  the  smallest  fraction  of  the  optimal  decrease 
obtained  by  any  indefinite  dogleg  step  in  any  of  these  tests  is  0.6  in  test  set  1.  These  results 
indicate  that  the  indefinite  dogleg  step  computing  function  does  quite  well  on  solving  (1.1), 
including  problems  where  B  is  fairly  large  and  non-sparse,  even  though  it  only  computes  the 
step  in  a  two-dimensional  subspace  and  is  not  r-optimal  in  theory. 

We  now  present  test  results  for  a  trust  region  algorithm  for  unconstrained  minimization 
that  uses  the  indefinite  dogleg  step  computing  function  to  generate  its  iterates.  The  code  that 
was  tested  was  a  modified  version  of  the  code  described  in  Schnabel,  Koontz,  and  Weiss  [1985’. 
The  exact  gradient  and  Hessian  functions  were  used. 
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This  algorithm  was  run  on  a  number  of  minimization  problems  in  a  standard  test  set  in 
More'  Garbow,  and  Hillstrom  [l98l|.  Table  4  lists  the  test  function  numbers  and  names.  The 
basic  set  obtained  from  these  18  problems  consists  of  three  tests  for  each  function,  given  by 
starting  the  iteration  at  the  standard  starting  point  xQ,  at  10zQ,  and  at  100z0.  Test  problems 
4,  5,  and  10  are  badly  scaled  and  were  omitted  from  our  tests  because  our  implementation 
made  no  attempt  to  deal  with  bad  scaling.  On  some  of  the  remaining  test  functions,  some  of 
the  larger  starting  points  led  to  overflows  at  the  first  iterate  or  first  step,  and  therefore  had  to 
be  discarded. 

Table  2  contains  the  results  for  the  tests  on  this  standard  test  set.  The  first  two  columns 
contain  the  test  function  numbers  and  the  number  of  variables.  The  third  column  contains  the 
power  of  10  by  which  the  standard  starting  point  is  multiplied.  The  fourth  and  fifth  columns 
contain  the  number  of  iterations  and  function  evaluations,  respectively,  that  were  required  by 
the  minimization  algorithm  using  the  optimal  step  computing  function.  The  sixth  and  seventh 
columns  contain  the  number  of  iterations  and  function  evaluations  for  the  same  minimization 
algorithm  using  the  indefinite  dogleg  step  computing  function.  The  eighth  and  ninth  columns 
contain  the  average  and  minimum  ratios,  respectively,  of  the  reduction  in  the  quadratic  model 
obtained  by  the  indefinite  dogleg  step  compared  to  the  reduction  obtained  by  the  optimal  step 
computing  function.  These  numbers  were  obtained  as  follows.  For  each  step  taken  by  the 
indefinite  dogleg  algorithm,  the  optimal  step  was  computed  for  the  same  quadratic  trust  region 
problem,  and  the  ratio  of  the  reduction  in  the  quadratic  model  by  the  indefinite  dogleg  step  to 
the  reduction  by  the  optimal  step  wa3  computed.  Then  the  average  and  minimum  of  these 
ratios  was  calculated  over  all  the  iterations.  Finally,  the  tenth  column  contains  the  total 
number  of  Cholesky  factorizations  attempted  by  the  indefinite  dogleg  minimization  algorithm. 


Table  2 


Performance  of  the  Indefinite  Dogleg  Step  Computing  Function 
in  a  Trust  Region  Algorithm  for  Unconstrained  Minimisation 
on  Standard  Test  Functions  with  Standard  Starting  Points 


Ave. 

Redn. 


Min. 

Redn. 


r7T*T 


14 

2 

0 

20 

23 

22 

27 

.99 

.99 

22 

1 

44 

56 

43 

55 

.99 

.99 

43 

2 

110 

146 

110 

146 

.99 

.99 

no 

15 

4 

0 

15 

16 

15 

16 

.99 

.94 

15 

1 

20 

21 

20 

21 

.99 

.98 

20 

2 

26 

27 

26 

27 

.99 

.99 

26 

16 

2 

0 

9 

11 

9 

11 

.99 

.99 

11 

1 

55 

75 

57 

74 

.99 

.80 

60 

17 

4 

0 

41 

51 

40 

51 

.99 

.95 

42 

1 

43 

55 

45 

59 

.99 

.97 

47 

2 

50 

67 

53 

67 

.99 

95 

55 

18 

7 

0 

7 

9 

7 

9 

.98 

.90 

9 

18 

8 

.  0 

10 

15 

12 

16 

.97 

.91 

15 

18 

9 

0 

9 

12 

9 

12 

.97 

.81 

14 

18 

10 

0 

10 

14 

10 

14 

.98 

.95 
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The  results  in  columns  8  and  9  of  Table  2  show  that  the  indefinite  dogleg  step  computing 
function  does  a  good  job  of  approximately  solving  (1.1).  Note  that  the  very  high  average  frac¬ 
tions  in  column  8  are  considerably  influenced  by  the  presence  of  a  large  number  of  iterates  at 
which  the  Newton  step  was  taken.  Nonetheless,  the  minimum  values  in  column  9  show  that 
the  dogleg  step  does  surprisingly  well,  even  in  the  worst  cases.  The  lowest  fraction  of  the 
optimal  value  obtained  for  any  step  of  any  of  the  trust  region  problems  encountered  is  .14,  and 
the  minimum  is  greater  than  .80  in  37  of  the  43  test  problems. 


It  is  also  interesting  to  observe  that  the  indefinite  dogleg  step  minimization  algorithm 
performs  only  slightly  worse  than  the  optimal  step  minimization  algorithm  in  terms  of  itera¬ 
tions  and  function  evaluations.  Further,  the  average  number  of  Cholesky  factorizations  per¬ 
formed  by  the  algorithm  on  all  the  Hessian  matrices  in  this  test  set  is  only  1.05,  and  the  aver¬ 
age  number  of  Cholesky  factorizations  on  indefinite  Hessian  matrices  is  just  1.14. 


Since  we  were  especially  interested  in  how  the  indefinite  dogleg  step  computing  function 
would  perform  in  the  presence  of  indefinite  Hessians,  we  formed  a  second  test  set  by  using 
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several  different  starting  points  for  each  of  the  problems  2,  12,  and  18.  These  problems  were 
chosen  because  they  seemed  to  be  fairly  difficult,  and  yielded  a  considerable  number  of 
indefinite  Hessian  matrices  when  used  in  the  tests  for  Table  2.  The  starting  points  for  the  tests 
reported  in  Table  3  were  chosen  to  be  scattered  around  the  standard  starting  point.  Table  3 
follows  the  same  format  as  Table  2,  except  that  the  column  for  the  starting  point  has  been 
removed. 

In  Table  3  we  still  observe  that  the  indefinite  dogleg  step  usually  obtains  a  quite  high 
fraction  of  the  optimal  reduction.  There  are  some  low  minimum  reductions,  but  the  average 
reductions  are  still  very  high.  The  overall  performances  of  the  indefinite  dogleg  and  optimal 
minimization  algorithms  in  terms  of  iterations  and  function  evaluations  are  similar. 

Our  results  also  indicate  that  there  is  very  little  difference  in  efficiency  or  reliability 
between  using  the  the  optimal  step  and  using  the  indefinite  dogleg  step  in  an  unconstrained 
minimization  algorithm. 

5.  Conclusions. 

Our  computational  results  indicate  that  even  though  the  indefinite  dogleg  step  computing 
function  is  not  r-optimal,  it  usually  obtains  a  quite  high  fraction  of  the  optimal  reduction  in 
the  quadratic  model  in  practice. 

These  results  may  not  have  great  significance  in  situations  in  which  the  algebraic  work  of 
factoring  or  calculating  the  eigenvalues  of  B  is  negligible.  For  problems  with  a  small 

number  of  variables,  or  with  a  very  expensive  objective  function,  the  best  course  of  action 
might  be  to  simply  use  an  optimal  step  computing  function.  Not  only  does  an  optimal  step 
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Table  4 


List  of  Standard  Test  Function  Numbers  and  Names 


Test  Function  Number  Test  Function  Name 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 
11 
12 

13 

14 

15 

16 

17 

18 


Helical  Valley  Function  i 

Biggs  Exp6  Function  ! 

Gaussian  Function 

Powell  Badly  Scaled  Function 

Box  3-Dimensional  Function 

Variably  Dimensioned  Function  < 

Watson  Function 
Penalty  Function  I 
Penalty  Function  II 

Brown  Badly  Scaled  Function  j 

Brown  and  Dennis  Function  < 

Gulf  Research  and  Development  Function 

Trigonometric  Function 

Extended  Rosenbrock  Function 

Extended  Powell  Singular  Function 

Beale  Function 

Wood  Function 

Chebyquad  Function 


computing  function  provide  the  best  possible  solution  to  (1.1),  but,  given  the  code  to  compute 
the  eigenvalue  information  for  B,  the  optimal  step  computing  function  may  well  be  easier  to 
implement  than  other  step  computing  functions. 

In  other  s:tuations,  however,  the  observations  in  this  paper  are  of  interest  and  may  sug¬ 
gest  new  algorithms.  For  problems  where  n  is  large,  the  two-dimensional  minimization  idea 
may  be  useful.  Also,  there  appear  to  be  various  situations  where  a  trust  region  approach 
seems  attractive  but  the  calculation  of  the  optima!  step  is  impractical.  Examples  include  the 


constrained  optimisation  algorithm  of  Celis,  Dennis,  and  Tapia  [1985],  and  the  tensor  methods 
for  nonlinear  equations  and  optimization  of  Schnabel  and  Frank  [1984,1986].  In  these  cases 
there  is  a  natural  two-dimensional  subspace  but  no  obvious  dogleg  path,  so  two-dimensional 
minimization  ia  the  apparent  choice.  The  results  of  this  paper  seem  to  encourage  the  use  of 


such  strategies. 
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